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Abstract: Wc will briefly outline a computational theory of the first stages of human vision according 
to which (a) the retinal image is filtered by a set of centre-surround receptive fields (of about 5 
different spatial sizes) which arc approximately bandpass in spatial frequency and (b) zero-crossings 
arc detected independently in the output of each of these channels. Zero-crossings in each channel 
arc then a set of discrete symbols which may be used for later processing such as contour extraction 
and stcrcopsis. A formulation of Logan’s zero-crossing results is proved for the case of Fourier poly¬ 
nomials and an extension of Logan’s theorem to 2-dimcnsional functions is also proved. Within this 
framework, wc shall describe an experimental and theoretical approach (developed by one of us with 
M. Fahle) to the problem of visual acuity and hypcracuity of human vision. The positional accuracy 
achieved, for instance, in reading a vernier is astonishingly high, corresponding to a fraction of the 
spacing between adjacent photoreceptors in the fovea. Stroboscopic presentation of a moving object 
can be interpolated by our visual system into the perception of continuous motion; and this “spatio- 
temporal” interpolation also can be very accurate. It is suggested that the known spatiotemporal 
properties of the channels envisaged by the theory of visual processing outlined above implement an 
interpolation scheme which can explain human vernier acuity for moving targets. 

Wc consider, in particular, the problem of avoiding aliasing in the perifoveal visual field. It is 
conjectured that, gap junctions (or another form of coupling) between rods and cones arc needed to 
avoid aliasing outside the fovea. Possible implications for machine vision and imaging devices are 
briefly discussed. 
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In the last seven years a new computational approach has led to promising advances in die under¬ 
standing of visual perception. This approach, which may be relevant not only for the information 
sciences but also for the ncurosciences, is mainly due to the late D. Marr and his colleagues. In this 
article we will briefly describe this computational theory for the very first stages of vision, since it 
provides an useful framework for approaching tire problem of spatiotcmporal acuity in human vision, 
which is the main topic of the paper. 1 


1.1 A Computational Approach 

The central tenet of this approach is that vision is primarily a complex information processing task, 
with the goal of capturing and representing the various aspects of tire world that are of use to us. It is a 
feature of such tasks, arising from the fact that tire information processed in a machine is only loosely 
constrained by the physical properties of the machine, that they must be understood at different, 
though interrelated, levels. This framework, formulated by Marr & Poggio (1976), was not new: H. 
Simon and especially L. Harmon emphasi/.ed a similar point of view in a more general context. 

In a process like vision it is useful to distinguish three levels over which one’s descriptions and 
explanations of the process must range: a) computational theory, b) algorithm, c) implementation. 
These are not hard and fast divisions. The important point is that no explanation or set of explana¬ 
tions is complete unless it covers this range. To avoid possible misunderstandings, we wish to stress 
that this computational approach is not a substitute for the “traditional” methods and techniques 
of the neurosciences to which it is in fact complementary. It is probably fair to say that most 
physiologists and students of psychophysics have often approached a specific problem in visual per¬ 
ception with their personal “computational" prejudices about the goal of the system and why it does 
what it does. With few exceptions this heuristic attitude, although useful, remained at the level of 
prejudices; computational analysis was not a science, nor was it appreciated in the neurosciences that 
one was needed. 

J Some of the material for this paper has been drawn from Poggio (1981) and Fahle and Poggio ( 1981 ). 
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This state of affairs is hardly surprising. The difficulties of the vision process are often not ap¬ 
preciated even now. Until the early 70's the field of computer science and artificial intelligence failed 
to realise that problems in vision arc difficult. r rhe reason, of course, is that we arc extremely good at 
it, but in a way which cannot be subjected to careful introspection. Today we know that the problems 
are profound. “Ad hoc” methods and tricks have consistently failed. Marr realized what the message 
was. A science of visual information processing was needed to analyze a given infonnation processing 
task and its basis in the physical world. Marr’s work, from the breadth of tire approach to its rigorous 
detail in the analysis of specific problems, provides a methodological lesson for this new field. 


1.2 The Detection of Intensity Changes 

In this section we will outline one of tire very first stages in the processing of visual infonnation, the 
computation of zero-crossings. The basic ideas, outlined by Marr in a paper (1976), have evolved 
into a scheme (Marr & Poggio, 1977) based on bandpass filtering of tire image through difference 
of gaussians and detection of tire associated zero-crossings. Marr and Hildreth (1980) have provided 
a number of attractive arguments for justifying this scheme from a computational point of view, 
although a complete formal theory is still lacking. We will outline here their main points. 

The goal of the first step of vision is to detect changes in the reflectance of the physical surfaces 
around the viewer or in the surface orientation and distance. On various computational grounds, 
sharp changes in the image intensity turn out to be the best indicator of most physical changes in the 
surface. In natural images, intensity changes can and do occur over a wide range of spatial scales. It 
follows that their optimal detection requires tire use of operators (that is filters) of different sizes. A 
sudden intensity change like an edge gives rise to a maximum or a minimum in the first derivative 
of image intensites or equivalently to a zero-crossing in the second derivative. Marr and Hildreth 
(1980) argue that the desired filter should take the second derivative of the image at a particular scale. 
A convenient choice for the derivative in two dimensions is the Laplacian V 2 = and 
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Figure 1 . A cross-section of the circularly symmetric centre-surround receptive field 
V 2 G. 


the appropriate scale can be set by filtering the image with a 2-D Gaussian filter G, which optimally 
satisfies specific constraints on the real world, particularly the fact that intensity changes arising from 
physical objects are spatially localized at their own scale. Since the operations of taking the derivative 
and blurring an image are linear, die overall transformation is equivalent to convolving the image 
with the Laplacian of a gaussian distribution, that is with V 2 G. As shown by fig.l, this corresponds to 
a centre-surround type of receptive field. Such a filter closely resembles die usual descriptions of the 
ganglion cell receptive field and of die psychophysical channels in human vision as the difference of 
two gaussians, an excitatory and an inhibitory one. Spatial filters with die centre-surround organiza¬ 
tion shown in fig. 1, are of course bandpass in spatial frequency, although dicir bandwidth is not very 
narrow. 

In summary, die process of finding intensity changes at a given scale consists of filtering the image 
with a centre-surround type of receptive field, with a size reflecting die scale at which the changes 
have to be detected, and then locating the zero-crossings in die filtered image (see fig.2). 


To detect changes at all scales, it is necessary only to add other channels, of different dimension. 
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Figure 2. The image (a) has been convolved with a centre-surround receptive field with the shape illustrated in Fig. 
1 (tr) shows the convolved image: positive values are shown white and negative black; white (black) values would then 
represent the activity of the corresponding on-(off-) centre ganglion cells "looking" at the image, (c) the zero-crossings 
profile contains rich information about the filtered image (b) as explained in the text. Similar independent filters of 
smaller and larger sizes are needed to capture the whole information contained in (a). From Marr and Hildreth (1980). 

and carry out the same computation for each channel independently. 


Zero-crossings in each channel thus form a set of discrete symbols which are used for later process¬ 
ing such as stereopsis (Marr & Poggio, 1977). Marr and Hildreth, in particular, addressed the problem 
of how to combine zero-crossings from different channels into primitive edge elements taking ad¬ 
vantage of physical constraints obeyed by the visual world. These and other symbolic descriptors 
then represent what Marr called the “raw primal sketch'’. Instead of describing these parts of the 
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theory, we shall discuss in more detail the zero-crossing detection process and the corresponding 
physiological and psychophysical evidence. Zero-crossings in the output of centre-surround channels 
represent a natural way of obtaining a discrete, symbolic representation of the image from the original 
“continuous” intensity values. Some recent deep resulLs in complex analysis by B.Logan (1977) seem 
to support this scheme in a way which we found intriguing and fascinating from when we came across 
his remarkable paper. His main theorem ( see Appendix la) states that a bandpass one dimensional 
signal with a bandwidth of less than 1 octave can be reconstructed completely up to a constant multi¬ 
plication factor from its zero-crossings alone (if some relatively weak conditions are satisfied). From 
die point of view of visual information processing there is clearly no need to reconstruct the original 
signal. But the theorem suggests that the “discrete” symbols provided by zero-crossings arc very rich 
in information about the original image. Unfortunately, more definite claims are as yet impossible, 
since an extension of the theorem to images (Appendix la and especially lb; see also Marr ct al., 
1979) docs not characterize completely die two-dimensional problem. In addition, centre-surround 
receptive fields arc not ideal bandpass filters, as required by Logan’s version of the theorem (see 
Appendices la, lb). Clearly zcro-crossing.s alone do not contain all the information (such as absolute 
intensity values), but as one of us has found in an empirical investigation, natural images filtered with 
V 2 G operators can be reconstructed to a good approximation from their zero-crossings and slopes. 
A successful extension of the Logan type of analysis to two-dimensional patterns may therefore repre¬ 
sent one of the critical steps for perfecting this computational analysis of low level vision into a solid 
theory. 


1.3 The Line Dctectors/Fouricr Analysis Controversy: A New Synthesis? 

The previous ideas based on Logan’s type of results not only lead to a satisfactory scheme for 
the analysis of intensity changes in an image; they also have fascinating implications for visual 
psychophysics and physiology, since they seem to account for basic properties of the first part of the 
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visual pathway. In particular these ideas explain why the image is filtered early on by approximately 
bandpass centre-surround receptive fields; they make more precise the notion of “edge-detectors” 
for extracting a symbolic description which contains full information about the image; and they state 
that this can be achieved only if the image was previously filtered with several independent bandpass 
channels — i.e. centre-surround receptive fields. As an immediate consequence these ideas also 
provide a solution of the long-standing controversy about edge-detectors versus frequency channels 
in the psychophysics and physiology of primate vision. The first stage of vision would indeed be per¬ 
formed to a good extent by “edge” detectors — actually zero-crossing detectors — and certainly not 
by Fourier analyzers; but in order for the zero-crossing detectors to extract meaningful information 
it is necessary that they operate on the output of independent channels, roughly bandpass in spatial 
frequency. 

Many results from die psychophysics and physiology of early vision can be easily interpreted in this 
new framework. It is, for instance, not too unreasonable to propose diat die V 2 G filtering stage is 
performed by ganglion cells of the retina and I GN, whereas a subclass of simple cells may represent 
oriented zero-crossing segments. In dais context it is not important how diis is implemented in detail: 
one of die several possibilities is diat simple cells may read die zero-crossings profile from die fine 
grid of small cells in layer 4C of the striate cortex, where a reconstruction of the filtered image, at 
different scales, may be performed (via intracordcal inhibition) with die goal of providing a very 
accurate posidon of die zero-crossings (see later). 

Several gaps have still to be filled in the computational theory of zero-crossings. For instance, 
since zero-crossings do not represent the complete information about die image, it is important to 
characterize the other primitives that are needed. At the other levels of explanadon experimental 
evidence in favour or against zero-crossings is of course highly desirable. Since the summer day in 
Tubingen where D. Marr with one of us first formulated the idea of zero-crossings in the output of 
independent, roughly bandpass filters, we cannot help feeling diat its experimental validadon — or 
falsification — is of critical importance for furdier developments of our approach to low-level vision. 
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2. Visual information processing: why spatiotemporal interpolation? 

Any visual processor with human-level performance must be capable of analyzing time-varying im¬ 
agery. The analysis starts with the spatio-temporal interpolation of the raw visual input. The spatial 
resolution of the photosensitive image available for processing is limited by the sampling density 
of the photosensitive elements in the sensor and by noise. Image motion introduces the additional 
problem of temporal resolution. The limiting factors arc the frame rate and the integration time deter¬ 
mined by the sensitivity of the photosensitive elements. This is of little consequence for a stationary 
scene, but for moving targets it poses the problem of motion smear. 

The problem of high spatiotemporal resolution can be partially overcome by using better sensors 
with larger arrays and higher frame rate. There arc, however, technological and physical limits to 
lire spatiotemporal resolution that can be achieved in this manner, since increasing the spatial and 
temporal sampling rate reduces the number of photons per sensor clement per cycle. Consider that 
since die number x of photons is Poisson distributed, a — The number of distinguishable levels 
was estimated by Barlow (1981) to be roughly n = 2y/x. Thus 8 bits of resolution (n = 256) requires 
about 2 14 = 10 5 photons. Note that the light intensity of a bright surface is 10 4 cd/m 2 and this 
means 10 4 photons per 50 msec per sensor, assuming a sensor efficiency similar to die human cones! 

Fortunately, the performance of a given sensor can be improved by appropriate spatiotemporal 
interpolation schemes. As we have seen, using such processes the human visual system achieves an 
extremely high spatiotemporal resolution compared to the sampling density of the photoreceptors and 
dieir integration dme. 

In summary then, temporal acuity, spatial acuity and motion smear are different facets of the same 
general problem posed to a visual processor by time varying imagery. We turn now to examine how 
the human visual processor deals with it. 


2.1 Visual acuity in human vision 
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Since the first measurements of vernier acuity in 1892 by Wuelfing in Tubingen, the extraordinary 
accuracy with which the human eye can estimate the relative positions of lines or other features in 
the visual field has represented a long-standing puzzle in vision research. Acuity of this type, also 
called hypcracuity, can be measured in a variety of situations. A typical example is the acuity found 
in reading a vernier (see inset of fig. 8a). This can be as fine as 5" of arc (Wcsthcimcr and McKee, 
1975), that is 0.02mm at 1 metre distance. The astonishing precision of this performance can be seen 
when the optical properties of the human eye are considered. In the fovea the hexagonal grid of cones 
samples the visual image with a sampling interval of no less than 25”, well matched to the optical 
point spread function of the eye (its gaussian core has a half width of about 45", corresponding to a 
spatial frequency of 60 cyclcs/dcgree). 

Most remarkably of all, vernier acuity is not affected by movement at constant velocity of the 
target in a velocity range from 0P/sec to at least 4°/sec (Wcsthcimcr & McKee, 1975). This means 
that a subject can detect tire relative position of two lines to within a fraction of a receptor diameter 
(and spacing) while tire whole pattern is moving across 70 receptors in 150 msec. Recently, evidence 
has been accumulating which suggests that the visual system is able to perform a very precise tem¬ 
poral interpolation as well, by reconstructing the spatial pattern of activity at moments intermediate 
between discrete temporal presentations (Barlow, 1979). The most telling demonstration, apart from 
cinematography, was introduced by D. Burr (1979a, see also Morgan, 1980) and is shown in the top 
inset of fig. 8c. Vernier line segments are displayed stroboscopically at a series of stations to portray 
a moving vernier; an illusory displacement occurs if the line segments are accurately aligned in space 
but are displayed with a few milliseconds delay in one sequence relative to the other. Not only do the 
segments appear to move smoothly from one station to the next but also, between the strobes, they 
are seen to occupy positions between those where they are actually exposed. The accuracy of detecting 
the equivalent displacement is again in the vernier acuity range, provided that the target moves at 
constant speed and elicits a clear sensation of motion. One is forced to conclude that not only spatial 
but also temporal interpolation is performed in the visual system to preserve acuity (and resolution) 
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for objects in motion (see Barlow, 1979). 

It is clear that the attainment of such spatiotemporal accuracy does not break any physical law (see 
Wcstheimer,1976). As pointed out by Barlow (1979) and by Crick et al. (1980), the classical sampling 
theorem allows a correct reconstruction of the visual input from a set of discrete samples in space 
and time since the LGN signal is bandlimitcd in temporal and spatial frequency by the photoreceptor 
kinetics and the eye’s optics respectively. In particular, Crick et al. have suggested (similarly to 
Barlow) that the fine grid of granule cells in layer IVc of the striate cortex performs an interpolation 
on tire output of the LGN fibres, with the goal of representing the position of zero-crossings (the 
boundaries between activity in an ON and OFF ganglion cell layer) with a very high accuracy (see 
also Marr and Hildreth, 1980 and Marr et al., 1979). 

Although spatiotemporal interpolation can be well understood in terms of information theory, 
the astonishing performance of the visual system seems to require an algorithm and corresponding 
mechanisms of great ingenuity and precision. As we hinted earlier, an understanding of visual inter¬ 
polation may also be quite interesting from a purely information processing point of view. High 
resolution, smear-free real time imagery could benefit significantly from tins study of human vision. 
Here we investigate some properties of this spadotcmporal interpolation. In particular, we examine its 
performance for a range of “sampling intervals” in space and time. 


2.2 Methods 

The vernier target used in these experiments consisted of a thin vertical bar made up of two segments. 
The stimuli were generated on a Tektronix 604 display under the control of analog electronics. Each 
bar was intensified for 0.1 msec at At msec intervals at n successive stations horizontally displaced by 
a separation Ax. Each of the two segments making up the bar was 24’ high and 1.5’ wide intensified 
to a luminance of about 50 times detection threshold on a background of 10 cd/m 2 . During an 
experimental ran, a target was presented every 3 seconds. Brief displays of n ■ At — 150 msec, 
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symbol x o A • □ 

tix V 2.5' 7.5' 15' 30' 



Figure 3. Vernier resolution threshold of spatial offset tor different separations A.r between the stations as a function of velocity. 
Fig. 3a shows the data from subject AK. fig. 3b from subject TV. The standard deviation of the data is about 25%of the threshold 
value for fig. la and 20%for fig. lb. In fig.3a the point for Ax — V and v —■ It P/n<tc was measured masking the beginning and 
the ending of the trajectory; lire same procedure'did not change the threshold for the point at v — 2.0°/,see. Of the two points 
at Ax = 2.5' and v — 25° /sec in fig. 3a. the worse value has been measured under the "masking" condition way whereas the 
better one was measured in the standard way. In fig. 3b also the point at A x — 2.5' and v — 25° /see. was measured with zero 
offset at die first and last station (from Fable, and Poggio, 1981). 

with randomized direction of motion (terminating at die central fixation point)wcre used to prevent 
effective pursuit eye movements (Westhcimer, 1954). The experiments measured 

a) the acuity for detection of real vernier offsets of the two segments by Sx seconds of arc 

b) the acuity for detection of apparent vernier offsets produced by delaying the presentation of the 
lower or upper segment, displayed at die same sequence of stations, by St msec 

c) the acuity for detection of mixed vernier offsets produced by a real spatial offset Sx together with 
a temporal delay St of opposite sign. 

In a forced choice task the subject was required to signal whether the bottom segment was dis¬ 
placed to the right or to the left of die top segment by setting a binary switch. Acuity was determined 
by the standard criterion of 75% correct identification. In all experiments reported here T is constant 
(T = 150 msec) and, as a consequence, the number of stations n is variable (n = 2 to 95). More details 
about the methods are given in Fahle and Poggio (1981). 






PNN 


.12 


SPATIOTEMPORAL INTERPOLATION 


symbol x O A • □ 
Az V 2.5' 7.5' 15' 30' 



Figure *4. Vernier resolution thresholds of temporal offset for different separations between the stations as a function of velocity. 
Fig. 4a shows the data from subject AK, fig. 4b from subject TV. The standard deviation is about 20%of the threshold values for 
subject AK and 18%for subject TV (from Fahle and Poggio, 1981). 

2.3 The Spatial Type of Acuity: Dependence on Velocity (v) and Separation (Ax) 

The results for spatial offsets (with simultaneous presentation of the two segments at each station) are 
shown in figs. 3a,b. The main result is that spatial acuity is relatively independent of the separation 
between the stations and of die velocity of the target up to rather large velocities. These data confirm 
and extend Westheimer’s and McKee’s results (1975), which showed that vernier acuity is unaffected 
by rate of movement from 0°/sec up to 4°/sec. Our results imply Uiat tiiis type of vernier acuity is 
relatively independent of A l, die strobe interval. 


2.4 The Temporal Type of Acuity: Dependence on v and Ax 

Figs. 4a,b shows the results for temporal offsets. The accuracy of detecting the equivalent displace¬ 
ment is in the classical vernier acuity range (compare Burr, 1979a.b): the best value for observer AK 
was 8" for spadal and 5" for temporal offset at comparable separations and velocities. Our main new 
result is that although acuity does not break down for large separations between the stations, at least 
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Figure 5. Fig. 5a shows the best vernier resolution threshold (with temporal olTset) for each separation Ax 
. The data arc from three subjects (partly from fig. 4a and 4b). O AK: O IV; X IIW. In fig. 5b the velocity 
v for which optimal vernier resolution is found is plotted against die separation Ax. Same data as in fig. 5a. 
From Fable and Poggio (1981). 

up to half a degree, it deteriorates significantly almost in proportion to Ax(sce fig. 5). 


Vernier acuity of this temporal type is bad at low and high speed. As already clearly demonstrated 
by Burr (1979a,b) apparent motion is necessary for temporal offsets to be seen as spatial offsets. Tn 
our experiments, deterioration of acuity at low velocities could be due to the speed per se as well as to 
tlie lower number of stations (because our total presentation time is constrained to T = 150 msec the 
stimulus consisted, at the lowest velocities, of two stations). In any case, deterioration of acuity at low 
velocities can be linked with a decreased sensation of motion. 

A second important result is that the range of velocities for which temporal interpolation is good 
shifts upwards for larger separations between the stations. The fact that at higher separations higher 
velocities are required for good resolution suggests that a more revealing parameter is the time inter¬ 
val At between the strobes. In fact, at any separation Ax, temporal interpolation is optimal for a 
temporal interval At between 20 msec and 50 msec. 





PNN 


14 


SPATIOTEMPORAL INTERPOLATION 


2.5 The Effect of Blur on Spatial and Temporal Acuity 

Standard vernier acuity is known to be affected, as one would expect, by attenuation of the high 
spatial frequencies of the vernier pattern (see for instance Stigmar, 1971). Is temporal interpolation 
also degraded in the same way? 

We have performed some experiments to answer this question by placing a ground glass screen at 
1 cm in front of the display. When a sharp line is viewed through such a ground glass screen the 
resulting light distribution has an approximately Gaussian line spread function with a width at half¬ 
height of at least 15’, corresponding to a cutoff frequency of around 3-4 cycle /deg. Our data show 
that in the experimental situation of fig. 4, blur of the pattern improves acuity at large separations and 
velocities. Fig. 6 compares directly for die same observer and for the same separation the effect of 
blur on spatial and temporal interpolation. Westheimer's type of acuity is degraded by blur, whereas 
Burr’s type of acuity improves dramatically with blur (at high velocities). Out of five observers only in 
one ease did blur of tire pattern cause a reduction in temporal vernier acuity at high separations and 
velocities. 

These data again show that temporal hyperacuity has different characteristics from spatial hyper¬ 
acuity. 


2.6. Spatial vs. Temporal Offset 

The apparent offset Sx 1 produced by temporal delay St should follow the ideal relationship Sx t — 
vSt. As shown by our data the sign of the offset is indeed correctly detected. Does its size also satisfy 
this relation? How faithful, in other words, is temporal interpolation? To answer this question we 
measured the temporal delay St needed to compensate for a given real spatial offset Sx for different 
conditions. 

Fig. 7 shows that for a separation Aa: — 2.5' and a velocity v = l.l°/sec the apparent offset 
Sx 1 — vSt matches rather closely the real spatial offset Sx. Under these conditions spatiotemporal 
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Figure 6. The effect of blur on spatial and temporal interpolation as a function of 
velocity for a separation between the station Ax — 15'. Vernier resolution of a spatial 
offset is measured with (•) and without blur (o). Vernier resolution of a temporal onset 
is also shown with ( ) and without ( ) blur. The screen was blurred as described in the 
text. Notice that the first point for spatial offset is for v — CP /see. The observer is TV. 

Tito standard deviation is about 20%of the threshold values. From Fahlc and Poggio 
(1981). 

interpolation is indeed rather precise (compare Burr and Ross, 1979). It is not so for higher velocities 
and/or larger separations (fig. 5). The temporal offset needed to compensate foi a real spatial offset is 
then much larger. 


3.1. Spatiotemporal Interpolation: How is it Done? 

The previous results constrain the problem of hyperacuity tightly enougli to justify a theoretical 
analysis of how spatiotemporal interpolation may be done in die visual system. The precise meaning 
of interpolation in terms of our visual stimuli is a well defined question, and this is the main point to 
discuss. 

3.1.1. A Simple Illustration 

Fig. 8 illustrates a very simple scheme for achieving spatiotemporal interpolation of a visual pattern. 
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Figure 7. Temporal (6x l ) vs, spatial (S.v) oll'set in Hie compensation experiment The 
ordinate shows the temporal ofl'scl (in equivalent spatial units 6x l — v ■ 6t needed to 
compensate the spatial ollsct shown in the absci.sa. • is for a separation between the 
station Ai = 2.5 ' and a velocity v — 1.1 l°/sec(A/. = 37 mstc). X is foi Ax = 2.5' 
and t> = 5.2SP/sec(A< = 7.0 msec). O is for Ax = 7.5' and v = 4.11°/sec(Af = 

30 msec). larger separations yield an even greater mismatch. The continuous diagonal 
indicates the loci of perfect compensation. Subject TV. Prom Fable and Poggio (1981). 

The elements of this scheme could be interpreted as cells with associated receptive fields and temporal 
impulse responses. Alternatively, Fig. 8 represents a computational scheme for spatiotemporal inter¬ 
polation. Visual input is sampled in space by an array of cells with a sampling density high enough to 
preserve the whole of the spatial information (in accordance with the sampling theorem). The input 
is then reconstituted in more detail on a finer grid of cells by convolving the sampled values with the 
function sine x. In effect each cell of the interpolation layer weights its inputs according to a centre 
surround receptive field. A variety of filters (i.e. “receptive fields”) are capable of performing a correct 
interpolation, especially in two spatial dimensions (see Crick et al. 1980). 


If the input intensity distribution is presented at discrete instants in time, temporal interpolation 
can be achieved by suitable temporal low pass properties of each individual pathway. If the temporal 
interval between presentations is small enough the effect of the filter is to reconstruct the original 
continuous temporal input. Spatial interpolation can then operate at each instant of time (this scheme 
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Figure 8. (a) A simple scheme for spatiotemporal interpolation. Hie input pattern is sampled by an array 
of "cells’'. Spatial interpolation is accomplished on a finer interpolation grid of cells each one weighting the 
sampled values with a sine shaped receptive field (shown in the lower inset). Temporal interpolation is obtained 
by filtering with an appropriate low-pass or band-pass filter each of the input channels (its impulse response 
is shown in the upper inset). Thus a series of discrete frames of a moving pattern can be interpolated (see 
Theorem 1 in Appendix 2) into a continuous temporal function in each of the channels. The spatial input 
distribution outlined here represents an intensity edge as seen by centre-surround ganglion cells, (b) The spatial 
interpolation process in Fourier space. Interpolation is equivalent to filtering, out the side lobes originated by 
the sampling process. Temporal interpolation can be interpreted in a similar way. From Fable and Poggio 
(1981). 

would of course operate succesfully for continuous movement of a pattern). 

Fig. 8b shows the Fourier interpretation of the spatial interpolation process (interpolation in time 
can be interpreted in a similar way). The effect of sampling is to replicate die original spectrum in an 
infinite number of side lobes. Spatial interpolation - i.e. reconstruction of die original function from 
its samples - is accomplished by filtering out all side lobes but die central one - which is the original 
spectrum. 

This model is probably the simplest conceivable scheme. In it, interpolation in space and time are 
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performed independently, since die temporal dependence of the input is not constrained in any way. 
We now consider the conditions under which this scheme can be effective. 

3.1.2 Remarks on Interpolation 

Before embarking on an analysis of various interpolation schemes, it is appropriate to make a few 
general points which arise from die discussion so far. 

First, die process of computing intermediate values from samples docs not depend on the existence 
of a finer rctinotopic grid of “cells”, where die results are represented. All filtering transformations 
indicated in Fig. 8 could be carried out at a radicr symbolic level for only a few distinguished points. 
Thus, it is important to keep separate the problem of a process from die problem of representing its 
output. This paper is directly concerned only with the first issue. 

Second, the goal of the interpolation process may be far more modest dian a full reconstruction of 
die input distribution. As suggested by Crick ct al. (1980), the aim of interpolating the ganglion cells’ 
activity is to provide die position of die zero-crossings (where activity switches from the on centre 
to die off centre cells) with high accuracy. This can be achieved by using very simple interpolation 
functions such as a normal centre-surround receptive field (Marr et al., 1980). 

3.1.3 More Complex Interpolation Schemes are Required 

The scheme of Fig. 8 can provide a correct reconstruction of a spatiotcmporal input sampled at 
intervals Af (in space) and At (in time) only when die input function is bandlimited in spatial (by 
/£.) and temporal (by f'D frequencies in such a way diat Af <1/2 f c x and At < 1 /2 f c t (theorem 1 in 
Appendix 2). The image which reaches the retina is indeed bandlimited in spatial frequencies to less 
than about 60 cycles per degree by die diffraction limited optics of the eye. Furthermore, a temporal 
cutoff is imposed at the level of die photoreceptors by their limited temporal resolution. The scheme 
of Fig. 8 can therefore correctly reconstruct an image sampled at intervals of less than 30” in space 
(for the 2-D case see Crick et al., 1980). Temporal samples of die photoreceptor activity could be 
interpolated under similar conditions (though regular temporal sampling in our visual system is highly 
implausible). 
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Since the spacing of the photoreceptors is almost exactly matched to the eye’s optics, interpolation 
in normal vision - when the image is a continuous function of time and space - can be accounted 
for by simple schemes like that of Fig. 8. In particular, such models could account for die vernier 
acuity measured with real continuous motion of die retinal image. When, however, motion of an 
object is simulated by presenting die image at discrete positions at separate instants, the conditions of 
theorem 1 are in general no longer satisfied. In our experiments we present to die eye an image which 
is already sampled either in time (Wcsthcimer type of stimulus) or space (Burr type of stimulus) or 
both. We enforce arbitrary sampling intervals Ax and At on die system before die bandlimiting 
operations of the eye’s optics and of die receptor kinetics come into play. Under these conditions 
die input function g(x, t) is not ensured to be appropriately bandlimitcd before spatial or temporal 
sampling occurs. The scheme of Fig. 8 should for instance perform poorly when die input function 
is sampled in space at intervals Ax significantly coarser than the photoreceptor array. Burr’s and our 
data, however, show tiiat under these conditions our visual system performs significantly better. We 
are clearly forced therefore to consider other types of interpolation schemes. 

3.2.1 The Spatiotcmporal Spectrum of a Moving Vernier 

Our analysis of alternative interpolation schemes begins with die description in frequency space of 
the physical stimuli corresponding to Westheimer’s and Burr’s experimental situations. When a spatial 
pattern g(x) moves continuously at constant speed, the resulting spatiotcmporal distribution of excita¬ 
tion on the retina has a simple representation in die Fourier space of temporal (/<) and spatial ( f x ) 
frequencies. Its Fourier transform takes values only on the diagonal line shown in fig. 9a with a slope 
equal to die velocity (see Appendix 2). For each spatial frequency contained in the pattern, there is 
a unique temporal frequency corresponding to it. Curtailing die duration of motion (in our case to 
T — 150msec) spreads the Fourier transform over a large area of temporal and spatial frequencies, 
changing the narrow line into a wider area. The spread (along the f axis) is die same for all our data. 
Thus die line supports shown in fig. 9 must be interpreted as being spread along J as a sine function. 
For T = 150msec the width of die spread is about 14 Hz for the central lobe of die sine function and 
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28 Hz for the central lobe plus the first negative side lobe on both sides. The retinal stimulus elicited 
by continuous motion of a vernier at constant velocity can be described in this way (see Appendix 2). 
The upper and the lower segment have the same line support on the f x — f t plane. Their Fourier 
transforms differ at all frequencies only by a phase factor which mirrors the spatial offset. The correct 
detection of this information underlies positional acuity. 
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Figure 9. Legend 

a) The support on the f x — f plane of the Fourier spectrum associated with continuous motion 
of a vernier (see inset) at constant velocity — v. The slope of the line is v. g(f x ,f) equals g(f x ) 
on that line. Curtailing the duration of motion to T = 150 msec., spreads the line into a bar-like 
support, corresponding to a sine function, b) The support of the Fourier spectrum associated with 
Westhcimcr’s type of experiment. The inset indicates that displaying the vernier stroboscopically at a 
sequence of imes with an interval St is equivalent to “looking” at die continuous motion of a vernier 
through a series of temporal “slits”. This has the effect of replicating the spectrum of fig.7a along 
the fi axis in an infinite number of side lobes. The distance of the lobes on ft is 1 /St. The line 
encounters the f x axis at l/i> • At = 1/Ax (if Ax — 1', the distance of the side lobes on f x is 
60 cycle/dcg). Notice that for any f x , each lobe supports the same complex Fourier spectrum g(f x ). 
c) The support of the Fourier spectrum associated with Burr’s type of experiment. Displaying the 
line segments of a vernier in the same position but with a slight delay is equivalent to looking at the 
continuous motion of a vernier through the spatial window depicted in die inset (transparent slits in 
an otherwise opaque screen.) This corresponds to replicating the spectrum of fig.8a along the f x axis. 
The distance of die lobes is 1 / Ax, where Ax is die interval between successive slits in die spatial 
window. At a given f x , the Fourier spectrum g(f x ) of different lobes is in general different, d) The 
support of the Fourier spectrum associated widi die compensation experiment is die same as in fig.8c. 
The different window corresponding to this sdmulus (see inset) corresponds, however, to a different 
complex Fourier spectrum (see Appendix 2). From Fahle and Poggio(1981). 
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Fig. 9 summarizes the description of the two basic stimulus configurations used in this paper 
according to the derivation outlined by Fahle & Poggio (1981) . Westhcimcr’s experimental situation 
is equivalent to looking at the continuous motion of a vernier through a series of equidistant narrow 
temporal slits within which the pattern is briefly visible (sec fig.9b). Burr’s experimental situation 
ideally corresponds to a vernier moving behind a spatial window with a scries of equidistant narrow 
slits (see fig.7c). The spatial or temporal windows affect differently the spectrum of the retinal input. 
As indicated in fig. 9, in the Wcsthcimer situation the complex spatial spectrum of tine pattern, 
which contains amplitude and phase information, is replicated an infinite number of times along the 
temporal frequency axis, whereas in tine Burr case tire same spectrum is replicated along the spatial 
frequency axis. An important observation is that in fig.9b (Wcsthcimer stimulus) all lobes at any 
given f x support exactly the same complex spectrum g. This is not so in fig.7c (Burr stimulus), where, 
instead, all lobes have die same g at any given f. We re-emphasize that fig. 9 describes die physical 
properties of the different stimuli without any reference to die human visual system. 

3.2.2 Computational Aspects of lnterpolatiou:Thc Constant Velocity Assumption 

More effective interpolation schemes are feasible if general constraints about the nature of die visual 
input are incorporated directly in the computation. The key observation here is that die temporal 
dependence of die visual input is usually due to movement of rigid objects, and that in everyday life 
motion has a nearly constant velocity over the dmes and distances which are relevant to die interpola¬ 
tion process (T < 100msec and x < 1°). The constant velocity assumption leads to a more specific 
form of the sampling theorem, given in Appendix 2 (see also Crick et al., 1980), which states formally 
what is intuidvely clear: the spatiotemporal sampling rate can become very low without losing infor¬ 
mation. Interpolation schemes based on the constant velocity assumption exploit the equivalence of 
the Ume and space variable (x rw vt). From the point of view of filtering this means that spadal 
and temporal interpolation cannot be performed independently as in the simple scheme of Fig. 8. In 
the Fourier domain die constant velocity assumption constrains the spectrum of die visual input to 
lie on the line support shown in Fig. 9a. In the ideal case of infinitely long motion the side lobes 
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generated by sampling cither in time (Fig. 9b) or space (Fig. 9c) can always be excluded by means 
of appropriate filters, if the precise value of v is known (e.g. by measurements). The recovery of 
the original spectrum (Fig. 9a) corresponds to an ideal interpolation for arbitrarily large sampling 
intervals (if v is known and different from zero). In die realistic case of finite duration of motion, finite 
sampling intervals are enforced by the spread of the Fourier spectrum into a larger area, but the same 
basic arguments still apply. 

3.2.3 Implementing the constant velocity scheme 

An interpolation scheme of this type could be implemented simply by measuring the exact velocity 
of movement and then reconstructing the spatiotcmporal trajectory of the pattern for cidier temporal 
or spatial information. Another, more attractive possibility is suggested by the idea, supported by 
much psychophysical evidence, diat in the human visual system there exist several channels at each 
eccentricity , i.e. several sets of receptive fields tuned to different spatial sizes and with different 
temporal properties. We imagine, following Burr (1979b) that these channels have somewhat overlap¬ 
ping supports covering the region of die (J x — f t ) Fourier plane which corresponds to die sensitive 
range of the visual system. “Stasis" channels are tuned to high spatial frequencies (small receptive 
fields) and low temporal frequencies (sustained properties); “modon" channels are tuned to low spa¬ 
tial frequencies (large receptive fields) and high temporal frequencies (transient properties). Thus, 
each channel is tuned to a different range of velocities, centred on die ratio between the optimal 
temporal and spatial frequencies characteristic for die channel: stasis channels for instance are tuned 
to low velocities whereas motion channels are tuned to high velocities. Fig.lOb shows a set of ideal¬ 
ized “velocity channels" of tliis type. Since each channel has its own cutoff in temporal and spatial 
frequency, interpolation may be performed independently and with different characteristics within 
each channel. In die Burr type of experiment stasis channels could correctly interpolate only patterns 
displayed at small separations and low velocities, whereas motion channels could be effective (but not 
so accurate) at large separations and high velocities by filtering out die side lobes arising from the 
coarse spatial sampling. The complementary argument applies for coarse time sampling. As indicated 
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in Fig. 10b the stasis channels may suffer from aliasing at values of Ax for which the motion channels 
interpolate correctly. We assume, then, that in this scheme the wrong channels are switched off by use 
of velocity information. 

Fig. 10c shows a more realistic interpolation scheme of the same basic type. Instead of many 
channels, each one sharply tuned to velocity and inactivated when the pattern docs not move at its 
characteristic velocity,.there are a few channels coarsely tuned to velocity and without any precise 
velocity sensitive inactivation, apart from directional selective properties. 

In die light of this analysis we turn now to a detailed discussion of our experiments. Our main 
question concerns of course which type of interpolation scheme is actually used by our visual system. 


4.1 Wcsthcimcr’s Acuity: Recovery of Spatial Offset 

a) In Fourier terms, die aim of the interpolation process is to filter out die side lobes, preserving only 
die central lobe, as die latter represents the Fourier spectrum of a continuously moving bar. 

When both the time interval At between presentations and the velocity v are small, interlacing 
of the side lobes in die Fourier spectrum is negligible. Temporal low pass properties of the visual 
pathway, as in the model of fig. 10a, suffice for eliminadng die side lobes and dius achieve a correct 
interpolation. When At is large, however, interlacing is considerable in die sense diat, even for the 
scheme of fig,10c, there are one or more channels which mix the main lobe with at least one of die 
side lobes. Because of the spread associated with the short duration of the motion sequence, actual 
overlap between the lobes can be significant. It turns out, however, that this does not represent a 
problem from the point of view of the spatial acuity measured in our experiments. At each f x the 
complex Fourier spectrum on all side lobes is exactly the same. Thus, the spatial spectrum is correct 
irrespectively of die temporal frequency and independendy of the number of side lobes contained 
in die support of die interpoladon filters. At large Ax and high v, die presence of die side lobes 
turns out to be even beneficial for vernier acuity; under these conditions high frequency channels. 
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Figure 10. (a) The support on the- Fourier plane of spatial and temporal frequencies of an interpolation 

filter corresponding to a scheme such as Fig.6. (b) Ihe support on the Fourier plan of a set of spatiotcmporal 
filters ideally tuned to different velocities. A large number is needed to cover all velocities of interest. The 
filters arc assumed to be direction selective, since they only operate in the Fourier quadrants corresponding 
to positive v = ft/f x in g(x + iff). A spatial pattern moving at constant velocity and sampled at spatial 
intervals Sx has on this plane the support shown by fig. 9c. To avoid aliasing, the low velocily filters can 
be "switched off" by information about the velocity of the motion, (c) A more realistic set of filters, broadly 
tuned to different velocities. The stasis channel is tuned to low temporal and high spatial frequencies and 
thus to low velocities. The motion channel is tuned to high temporal and low spatial frequencies and thus 
to high velocities, intermediate channels (not shown here) may also be present The hatched areas represent 
the support of such directional filters. Nondirectional fillers would have also a symmetric support in the other 
two quadrants. From Fahle and Poggio (1981). 

which would not be stimulated by continuous motion, can obtain the correct spatial information from 
the side lobes, which are an artefact of the discrete time presentations. On the whole, and in the 
absence of a sophisticated interpolation process that always excludes all side lobes (such as the scheme 
of fig. 10b), one expects vernier acuity to be rather invariant for a wide range of separations and 
velocities. Our data conform well to these expectations. Notice that the presence of side lobes at high 
velocities and large separations corresponds to die perception not of a moving bar but of a briefly 
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illuminated stationary grating - which carries however the correct spatial information. In this sense 
at large Ax and high v interpolation fails to retrieve the “correct" spatiotemporal pattern, but still 
preserves spatial acuity (even at extremely high speeds). 

b) The qualitative interpretation of our data in usual space-time variables is straightforward. Spatial 
interpolation, for instance by appropriate receptive fields, takes place correctly for each frame (i.e. for 
each station ) even when temporal interpolation fails. Since our forced choice task measures only spa¬ 
tial acuity, performance is in this case independent of the interpolation of the temporal dependence of 
the visual input. 

c) These results suggest that spatiotemporal interpolation is not performed by tbc “ideal” interpola¬ 
tion scheme of Fig. 10b. For temporal aspects should then be retrieved correctly at all At, while 
acuity for high velocities should be exactly as bad as for continuous motion. The one channel scheme 
of Fig. 10a could explain these data on positional acuity; but as pointed out by Burr (1979b, 1980) the 
image should then be inevitably smeared at all but very low velocities. 

4.2 Burr's Acuity: Interpolation of Temporal.Offset 

a) In Burr’s experiment the situation is quite different. For any given f x the side lobes contain 
different parts of the original spectrum. Thus when more side lobes lie in the support of the same 
channel (in fig.lOa or fig.lOc) there is a mixture of spatial frequencies, detrimental to acuity. One 
understands, therefore, that acuity deteriorates considerably (see fig. 2) with increasing overlap among 
the side lobes (large separations between the stations). At any given (large) separation, low velocities 
bring about a considerable overlap between the side lobes. Higher velocities reduce the degree of 
overlap at the expense of high spatial frequency information, which is filtered out by the temporal 
cutoff(s) of the visual pathway (between 20 and 50 Hz, see for instance Kelly, 1979). Thus one 
expects to find for each separation Ax, an optimal velocity at which the side lobes just avoid overlap. 
Assuming a spread of ^ 15 Hz the optimal velocity (in dcgree/sec) should be v — 30 • Ax (Ax in 
degrees), which is in rough agreement with the data of fig. 5b. When the velocity approaches zero the 
line supports in fig. 10c all tend to lie on the f x axis (notice that, because of the finite presentation time 
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T, the supports effectively overlap). In this situation information about the offset cannot be retrieved. 
In the limit of very high velocity the set of lobes approaches tire line spectrum of a stationary grating 
with no offset. Notice that we assume for tire scheme of fig 10c that tire vernier threshold is higher 
when some of tire channels signal zero offset while tire others still “sec" tire correct offset. 

b) When the temporal component of tire filters fails to interpolate between temporal frames motion 
is perceived as discontinuous. As a consequence the spatial interpolation process correctly signals zero 
spatial offset for each frame. Tire critical strobe interval which yields optimal temporal interpolation 
is not very different between tire channels (see Fig. 5a). Though its performance may worsen at 
high velocities, as for tire continuous motion, it should be rather invariant with respect to Ax, the 
separation between the stations. Fig. 5a shows that this docs not happen. The opposite conclusion 
holds for the scheme of Fig. 10a. Its performance should deteriorate rapidly for separations Ax 
between tire stations larger than tire distance between photoreceptors, which is in conflict with Burr’s 
and our data. An interpolation scheme of the type of Fig. 10c seems consistent with these results: 
while small, slow “receptive fields” would be unable to interpolate correctly at large separations (Ax 
large), fast receptive fields could perform a correct interpolation, if the velocity is appropriate. 

lire fact that spatial acuity is extremely good at separations up to 2.5’ suggests that tire interpolation 
channels are direction selective. 

4.3 Effect of Blur 

a) Tire interpolation scheme outlined in fig. 10c makes a rather strong prediction about the effect 
of blur. In the Westlreimer case blur can only degrade vernier acuity, since it eliminates the high 
frequency channels. Blur of the Burr stimulus, however, should improve acuity at least at large separa¬ 
tions and high velocities, since it eliminates side lobes which signal the absence of an offset. Our data 
are fully consistent with this expectation. A more perceptual but equivalent description of the effect 
of blur is this. At high velocites and large separations there is a strong sensation of a grating of thin, 
unbroken lines - corresponding to the side lobes seen by visual mechanisms tuned to low temporal 
and high spatial frequencies - and a weak impression of a single moving target with a clear offset 
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- corresponding to the main lobe seen by mechanisms tuned to lower spatial and higher temporal 
frequencies. This ambiguity is removed, as already noticed by Burr (1979), by the blur of the screen, 
which suppresses the high frequency grating. 

b) In other terms, blur eliminates the contribution of the small receptive fields which are unable 
to interpolate correctly at large separations and therefore signal zero offset. The large receptive fields, 
however, remain largely unaffected by blur. 

c) The effectiveness of blur in improving vernier acuity at large Ax shows that our visual system 
does not normally have the intrinsic possibility of switching off the wrong channels as assumed in the 
scheme of Fig. 10b. 

4.4 Spatial vs. Temporal Compensation 

a) This stimulus situation corresponds to looking at the continuous motion of a vernier through the 
spatial window shown in the inset of fig. 9d. The resulting Fourier support, is again as in fig. 9c: 
here, however, the main lobe signals no offset, corresponding to precise spatiotcmporal compensa¬ 
tion, whereas the other lobes all signal the spatial offset between the upper and lower grating of 
the window. In other words, exact compensation between space and time is realized only in the 
main, correct lobe. Thus, the spatial offset should dominate as soon as die side lobes are “seen” 
by some of the channels of fig. 10c. This is increasingly so for larger separations Ax between the 
stations. Correspondingly, die perception of the stationary grating carrying spatial offset information 
(the broken slits in the window of fig. 9d) is expected to dominate at large separations and velocities. 
Again our data are consistent with these expectations. Even at relatively small separations between 
die stations (see fig. 7) the system does not achieve a perfect interpolation - that is, removal of all 
side lobes. Only in this case would die temporal offset exactly cancel die spatial offset. As expected, 
blur improves compensation, since it helps to remove the “wrong" side lobes, which carry information 
only about the spatial offset. 

b) This experiment combines Burr and Westheimer stimuli. Since spatial interpolation always 
retrieves the spatial offset, this dominates for all cases in which the temporal component of interpola- 
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tion is not fully correct. 

5. Discussion 

To summarize, the psychophysical experiments reported here suggest that spatiotcmporal interpola¬ 
tion in the visual system, remarkable though it is, is far from being perfect and flawless. Ideal 
interpolation is equivalent to filtering out the side lobes in die Fourier spectrum arising from the 
discrete presentations. The task is easy at srrtall separations but requires in principle complex filters for 
large separations (see Crick et al., 1980). As our data suggest, our visual systems do not seem to use 
a very sophisticated spatiotcmporal interpolation process. The side lobes are not effectively filtered 
out under all conditions. Spatiotcmporal interpolation, then, can be considered as a direct conse¬ 
quence of the spatial and temporal properties of early vision, in terms of an interpolation scheme of 
the type of fig.lOc. The existence of independent channels tuned to different spatial and temporal 
frequencies seems to account for the spatiotcmporal interpolation revealed by our experiments. A 
detailed theoretical analysis with the help of appropriate computer experiments is necessary for a 
quantitative evaluation of interpolation models of this type. 

5.1 Explicit or implicit interpolation? 

Interpolation can be regarded as a spatiotcmporal filtering of the input transmitted from the retina. 
This is the point of view taken in this paper. We cannot advance any hypothesis as to where this 
filtering stage may be localized in the brain on the basis of our psychophysical data alone. Throughout 
this paper we have used tire term “interpolation" without necessarily implying a direct reconstruction 
of the pattern of visual activity, say its zero-crossing profile in the various channels, somewhere in the 
visual pathway. Clearly, hyperacuity may simply rely on a specialized routine operating on a small 
region of the image to answer specific questions, like the right-left choice in a vernier task. Thus 
the interpolation scheme suggested by our data may be implemented as an “implicit interpolation", 
that is, as a computational process involving manipulation of symbolic quantities; or it may depend 
on an "explicit reconstruction" of a (coded) version of the array of photoreceptor activity on a fine 
retinotopic grid of neurons. These extreme possibilities - and all in between - can be implemented in a 
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variety of ways. For instance, activity may be reconstructed automatically on the fine topographic grid 
of layer lWc/3 by an automatic, parallel process. 

On the other hand, a specific, more symbolic process could read the output of retinal ganglion cells 
and perform the correct interpolation for any desired position and time. In this case interpolation 
would be implicit and mixed with the decision process itself. 

In the first case, tht decision routine (is the upper segment to the right or to the left?) would 
operate on an interpolated version of the image. Thus, “reprogramming" of die vernier routine may 
not be expected to affect the interpolation process but only the detection criteria, contrary to the 
second case, in which different detection strategies may influence interpolation. 

5.2 Arc the Psychophysical Channels the Interpolation Filters? 

Our data support interpolation schemes of the type outlined in Fig. 10c. They say, however, neither 
how many independent channels are needed, nor what are exactly their spatiotemporal properties. 
Our results seem consistent with standard characterizations of their spatial and temporal properties 
(Campbell and Robson, 1968; Burr, 1979b; sec also Marr ct al., 1980; Wilson and Gieze, 1977, Wilson 
and Bergen, 1979). 

These observations suggest die interesting idea that the spatial frequency tuned channels present 
in early human vision may be the interpolation filters themselves. To be completely explicit let us 
consider simple examples of how an interpolation scheme such as Fig. 10c might be implemented 
in the visual system. The first possibility is that the image is filtered before interpolation dirough 
various independent channels. Retinal or LGN ganglion cells of different sizes could represent the 
image filtered at different resolutions. Later in die visual pathway each of these representations would 
be independently interpolated on a finer cortical grid of cells with a receptive field very similar to 
die corresponding LGN cells. Another possibility is that only two of the channels are present at the 
precortical level (e.g. X and Y) and diat the measured psychophysical channels represent interpolation 
filters operating on their X and Y input at die cortical level. In this second case one would expect only 
two sizes of receptive fields - at each eccentricity - in the retina and LGN but a scatter of sizes in the 
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cortex (possibly in IVc). Thus the same retinal channel may be interpolated in two different ways, by 
small cordcal rccepdvc fields and by large ones, tire first reconstructing the high frequency content 
of die retinal channel and the second emphasizing its coarser details. Notice that as a consequence 
cortical (interpolation) channels may have a narrower bandwidth than retinal ones. 

5.3 A prediction: interpolation must be direction selective 

An explicit interpolation scheme of this type consists of a set of motion channels with direction selec¬ 
tive properties, in the sense that the spatiotcmporal interpolation filter thereby implemented must 
depend (in one dimension) on the sign of v (see appendix of Fable and Poggio, 1981). As a conse¬ 
quence the interpolation channels should have some type of direction selective property; furthermore, 
cells of layer IVc -if they arc involved at all - should show, despite their center-surround receptive 
field, some non-standard direction selective property. 


6. Interpolation in the perifoveal visual field: docs aliasing occur? 

In die perifoveal retina, the spacing of the ganglion cells increases, as Barlow pointed out, whereas 
die optical cut-off remains approximately tine same (for instance at 1(P eccentricity; see Weale, 1976). 
The grid of ganglion cells is, however, matched to tire spatial cut-off of the signal thereby represented: 
in the cat, Peichl and Wassle (1979) have shown that receptive field diameter and ganglion cell separa¬ 
tion both increase towards the periphery so that sampling in the array of ganglion cells takes place at 
the interval appropriate to the cut-off frequency passed by the larger receptive fields. Thus, the grid of 
ganglion cells is likely to satisfy the sampling theorem (see Hughes, 1981). 

A more serious, and so far unsolved, problem is whether in the perifoveal visual field the signal 
represented by the ganglion cells suffers from aliasing, i.e„ undersampling, at the level of the 
photoreceptors. If only cones are involved, aliasing seems unavoidable for eccentricities larger than 
about 5° — 10°. The classical sampling theorem requires that the signal is lowpass filtered before 
sampling in order to avoid overlap of the sidelobes in the Fourier spectrum (i.e., aliasing). Lowpass 
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filtering after sampling cannot always avoid aliasing. 

It is easy to show that ideal lowpass filtering after sampling eliminates overlap of the sidelobes 
only up to sampling intervals that are twice the limit set by the sampling theorem. 2 Preliminary com¬ 
puter experiments support these conclusions for the approximately lowpass filtering performed by a 
center-surround receptive field; in this case, however, effectiveness of lowpass filtering decreases more 
gradually with increasing sampling intervals. 

This scheme is somewhat supported by Poliak’s data showing that visual acuity threshold increase 
with eccentricity more than the separation between cones. Convergence of cones on X ganglion cells 
is therefore likely to increase with eccentricity. 

If aliasing cannot be fully avoided, hypcracuity threshold must rise faster with eccentricity than 
visual resolution thresholds, a result which has been recently established by Westhcimcr(1982). If the 
reason for this were indeed aliasing, blur of the vernier pattern should improve vernier acuity in the 
periphery, at least in the absence of noise. Rlur of the pattern corresponds to lowpass filtering of the 
signal before sampling, as required by the sampling theorem. Preliminary experiments performed to 
test this prediction indicate, however, that blur may improve hyperacuity only slightly, if at all (Fahle 
and Poggio, 1981; Westhcimcr, pers. comm.; Fahle, pers. comm.). 

A possible explanation for this small effect arises, if input from rods (in addition to cones) is also 
allowed. Aliasing in the periphery could then be largely avoided at all eccentricities by lowpass 
filtering the image before sampling, by pooling together inputs from all neighboring photoreceptors- 
rods and cones-via either gap junctions or synaptic coupling in second order neurons. If this predic¬ 
tion were correct, the decrease of vernier acuity with eccentricity would not depend on aliasing but 
would simply be a graded phenomenon due to the increasing spacing (in terms of visual angle) of 
the cortical grid and on a decreasing signal to noise ratio (because of the decreasing density of cells). 
The ineffectiveness of blur is consistent with this scheme. A critical test of this hypothesis may be 

2 This is achieved at the expense of a much more extensive loss of high spatial frequencies than in the case of lowpass 
filtering before sampling, Localization of an isolated feature like a zero-crossing is, however, rather unaffected by loss 
of high spatial frequencies, in the ideal case of small noise level. 
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obtained by measuring vernier acuity in the periphery under different conditions of light adaptation. 
An important corollary of this prediction is that the space constant of the electrical coupling should 
increase proportionally to cone spacing from die fovea to the periphery (the rod network may have 
interesting spatiotcmporal properties (see Detwiler et al„ 1978), possibly useful for moving patterns). 
Several morphological studies have demonstrated apparent connections between cones as well as be¬ 
tween rods and cones in die vertebrate retina (see for instance Raviola and Gilula, 1975). Nelson 
(1977) has provided physiological evidence for die cat diat cones have inputs from rods, probably 
mediated by die rod-cone gap junctions. The above conjecture would explain why coupling of this 
type is needed already at the level of the photoreceptors, whereas improvement of signal-to-noise 
ratio could be achieved in a simpler way with convergence of signals at a later level in die redna. 

6.1 Significance for information processing and machine vision 

There are various mcdiods for reconstructing the original signal at high resolution by interpolating 
values measured at widely spaced intervals. The best known approach to this problem is based on 
die Shannon sampling dicorcm and on its various extensions. For static images interpolation of diis 
type can provide a resolution much higher than die original sampling grid. Since in our framework 
die position of zero-crossings (and not die grey level values) is important, Hildreth and Poggio have 
examined the problem of interpolating the values of the V 2 G convolution in order to obtain precisely 
the location of zero-crossings. Analytical arguments, supported by computer experiments, have shown 
diat the posidon of a zero-crossing can be interpolated precisely in terms of very simple interpolation 
functions, even by linear interpolation. For time-varying images die situation is more complicated. In 
the classical sampling theorem, interpolations in space and time are performed independently, since 
die temporal dependence of die input is not constrained in any way. Interpolation algorithms based 
on the constant velocity assumption discussed earlier could achieve higher spatio-temporal resolution 
for objects in motion, as long as the constant velocity assumption is not grossly incorrect, despite 
low spatial and temporal sampling rates. Positional acuity for die image features, e.g., the zero- 
crossings, although desirable, is not the only goal of this spatiotemporal interpolation stage. A filter 
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that correctly interpolates the sampled image automatically avoids any defect in the representation of 
the image since it reconstructs the “original" input. It avoids in particular motion smear; and it “fills 
in" eventual gaps either in space or time, where or when the sampled input is missing. Real time 
vision machines may well need such an interpolation stage and it will be interesting to see the form 
and the performance of a computer implementation. In particular, the "gap junction" scheme for 
avoiding aliasing with sparse sampling intervals may be usefully implemented in future CCD devices. 


Acknowledgements. We arc grateful to E. Grimson for reading the paper, to G. Wcinraub for drawing 
die figures and to P. Rogers for her help with the manuscript. 
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Appendix la 

Logan’s results apply to Boo(X) functions, i.e., the restrictions to the real line of entire functions 
of exponential type X whose growth (on (R)) is less titan exponential. In particular, they apply to pe¬ 
riodic functions with the exception of theorem 4 (Logan, 1977), which can be specialized to periodic 
functions (Logan, personal communication). If we restrict ourselves to trigonometric polynomials, it is 
possible to illustrate Logan’s results in a simple way. It should be stressed, however, that trigonometric 
polynomials are a very special case and in general erroneous inferences can be made from their 
special properties. With this “caveat” in mind, let us consider the real band limited function 

N 

h{t) = Yj c » einl C n = C- n (1) 

— N 

which can be extended to the complex plane as 


N 

h{z) = YlCne in2 

—N 

h(z) is for instance bandpass with one octave bandwidth if 


C n ~ 0 H <~ 

The complex free zeros of h(z) are the complex zeros of h(z) in common with its Hilbert transform 
h[z ) where 

N 

K z ) — T, c n e inz c n — —i sign(n)C n (2) 

- N 


Let us define, given h(z) 
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N 

P{z) = °ne inz 

A+l 


N(z) = 


-( A+l ) 

C n e inz 


—N 


(3) 


where A is die low-frequcncy boundary of die spectrum of h(z) (assumed in die following bandpass). 

Then the free zeros of h(z) arc completely characterized by die following dirce equivalent formula¬ 
tions: 

The free zeros of h{z) are such z*\ 




P(z*) = 0 N{z *) = 0 


(a) 


h{z') = 0 P(z*) = 0 


(b) 


P{z*) = 0 = 0 (c) 

Observe diat if z is a zero, z is also a zero of h(z); and if z is a zero, z + 2kt k an integer, is also a 
zero. 

The coefficients C„ of h(z) may be determined by the 2 N roots of h{z) as the solutions of the 




N 

. °ne inzi = 0 


— N 


system of 2 N equations 
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N 

J2 C n e inZ2N = 0 (4) 

— N 

Let us now rewrite 


N 

h(z) = £c N e <n * 


—N 


as 




with 




f 2N \ 

XI 9n< n S N 

v° ) 


(5) 


i = e iz , 9n = C n _ N , R [z] = [0, it], N = :2 M 

Thus the nontrivial zeros of h(z) coincide with the zeros of g n $ n , that is, a polynomial of 
order 2 N. If die 2 N roots g would be known, it would be possible to write 2 N equations in the 
2N + 1 real unknowns (C n ): 

2N 

= 0 
o 




2 TV 


X! 9n$2N — 0 
0 


( 6 ) 
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with 



Since the determinant of the roots is a Vandermonde determinant, it always has maximum rank if 
the roots are distinct. The question is under which conditions tire real roots alone detennine, apart 
from a multiplicative constant, the set of C Tl , i.c. h(z). Clearly, multiple zeros, in particular multiple 
real zeros, cannot be allowed. Observe that if more than 2 N real z.cro-crossings would be available (in 
a basic period) then h — 0. 

Under the bandpass condition ( C n = 0 for n < A) there arc at least 1A real zero-crossings per 
period. The real unknowns arc 2b,b—N — A , that is the number of non-zero C n between N and 
A, counted twice because they are complex numbers. A sufficient condition to ensure that there are 
enough zero-crossings, and thus equations,’ is A — M — i.c., C n (for n > 0) all non-zero in [M, 

2 M], Notice that [M, 2 M] i.e., one octave bandwidth would not be sufficient: in this case there would 
be at least 2M real roots but 2(M + 1) unknowns C n . The matrix associated to the homogeneous 
equation in the “roots” 

< e —i2Mti e —i(/l+1)<1 e i(/l+l)q e i2Mti> 

— i2Mt2M j 

has rank at most 2 M — 1 (since there exists C n such that C n e inx vanishes identically for x — 
ti.. .< 2 at) and this would just not suffice to specify the C n modulus a multiplicative constant. 

Although the less-than-1 octave condition is sufficient to ensure enough zero crossings, it is by no 
means necessary. In fact, there are classes of bandpass signals with a larger bandwidth and still enough 
zero-crossings. 

In any case, even when there is a sufficient number of zero-crossings, the question still remains 
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of whether the determinant of the matrix of the “roots” \e intz \ has maximum rank (2 M — 1) and 
therefore the C n can be determined (modulus a multiplicative constant). If the rank is less than 
2 M — 1 then the C n arc not uniquely determined and as a consequence h{z) is not determined by its 
real roots. Logan (1977 and personal communication) has proved that 

a) if a free zero exists then h(z) is not uniquely determined by its real roots and 

b) if there are no free zeros, h(z), provided its bandwidth is appropriate, is determined, modulus a 
multiplicative constant, by its real zero-crossings. 

In the following, we will outline Logan’s main theorems for tire case of trigonometric polynomials. 

Theorem 1 

If h{z) has 1 or more free zeros, the rank r of the determinant of the roots is r < 2 M — 1. 

Proof 

h(t) can be written as 


h(t) = P{t ) + N(t) 

M— 1 M — 1 

= e~ i2Mt { J 2 9nC int } + Pne int } 


( 8 ) 


o 

AT —1 


M—1 




-i2Mt 


JJ ( e it _ e i»J) _J_ e i( M+1) _J_ Y[ ( e il — e i6j ) 


If e is a free zero of h(i) then we can divide h(t) by the real function 


f{t) — {e lt — e lt )(e zt — e lt ) — (2 ie^ sin ^ £ )(2ze“z~‘ sin - — 6 ) = A sin - 6 sin - — 6 

z z z z 

( 9 ) 

with A real. 

The resulting is still a periodic bandpass function of the form 



PNN 


41 


SPATIOTEMPORAL INTERPOLATION 


O- 


W) 

m 


—M 2 M 

_ ^ gj gint gr g ini 

—2 AT AT 


( 10 ) 


and actually of reduced bandwidth. Multiplication of by any arbitrary [a — cos (t — o)], a > 1 
which can be always written as Csin ^ sin provides a periodic bandpass function with the 
same bandwidth as the original h(t) but different from it despite the same real zeros. Notice that if e is 
not a free zero, ^ will no longer be a periodic bandpass function. This means that the determinant 
associated with die homogeneous equation 7 has at most rank r = 2 M — 2. 


Theorem 2 

If h(t) has no multiple and no free zeros die rank of die determinant of the real “roots” is r = 
2M — 1. 

Proof 

Clearly r cannot be r > 2 M — 1. If hi and h 2 have the same bandwiddi and the same real zeros, 
tiien 


2 AT—1 


h\hi -f- hih 2 = ^2 9ne r 


int 


(ID 


2 A/—I 


hfa-hih 2 = 


( 12 ) 


as it is easy to check by substitution of equation (2). If die real zeros are 2 M in number and distinct, 
the Vandermonde determinant associated to the real roots of equation 12 is different from zero; thus, 
the unknowns g n are identically zero. The same argument implies that all P n are also identically zero. 


Thus, ^ — r 

/ll Al2 


M(t). 
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Now M{t) is any function with the same zeros (real and complex) of h%. But h\ is a bandlimited 
function h\(t) = YI—Im C n e int which is uniquely determined (apart from a multiplicative constant) 
by its 4M real and complex zeros. Thus h\ and hi must coincide identically and the theorem follows. 
The theorem can be generalized allowing for real zeros. 

Finally, a short remark about the multiple and free zero condition. It is rather intuitive that mul¬ 
tiple and free zeros are not generic; assume, for instance, that the polynomial J2—n C n e inl has a 
free zero. It is enough to perturb one of the coefficients C n to annihilate the free zero. Similarly, if 
the trigonometric polynomial is a sample function of a random process, the coefficients C n would be 
random numbers, as well as the zeros of the associated polynomial f] 2V (C — &). The probability that 
a zero is free (i.e. with & = p e 10 , is free iff ^e t0 is also a zero) is usually very low. 


Appendix lb 

Logan’s result can be extended to tine case of a two-dimensional entire function /(x, y) if it is 
bandpass in x with a band-width strictly less than an octave and band-limited in y . In this case, the 
restriction of / to a one-dimensional line l T in the x, y plane parallel to the x axis will be bandpass 
with less than an octave band-width. Provided the free-zero condition is met, Logan’s theorem tells 
us that the zeros of / along l x determine / there up to a multiplicative constant. To determine / 
everywhere up to a multiplicative constant, these parallel slices must be tied together. 

The following lemma shows that Logan’s theorem can be invoked for / restricted to a line lg which 
is not parallel to the X axis. lg will intersect all slices l x parallel to the x axis, so determining / up to a 
multiplicative constant on lg determines / up to the same constant along each of the slices 4. 

Lemma 

If f{x, y) is ideally bandpass with band-width strictly less than an octave in x and band-limited in y 
then there is an e > 0 such that / along all slices, lg which make an angle 0 < e with die X axis, will 
be bandpass with band-width less than an octave. 
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Proof 

The support of the Fourier transform of / is confined in im x to the intervals I\ — (—2 a -f- S, —a — 
6) and 1/ = (a - f- 6, 2 a — 5 ) and in to the interval J = (— b, b) for some positive 6, a, and b. 
Observe that the support of the Fourier transform of a slice l through / is confined to the projection 
of the support of the Fourier transform of / onto the axis. The rectangles I\ X J and J 2 X J will 
project into the intervals (—2a, a) and (a, 2a) on l^ provided that l makes a sufficiently small angle 
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Appendix 2 


Wc consider a one dimensional pattern g(x). Arbitrary, non rigid movement of this pattern produces 
a spatiotcmporal image g(x, t). Rigid movement of the same pattern at constant speed gives an image 
g(x, t) = g(x — vt). We state here the classical sampling theorem for the first case and an appropriate 
modification of it for the second case. 

Theorem 1 (classical sampling theorem) 

If a signal g(x, t) is bandlimitcd in spatial and temporal frequencies it can be recovered exactly by 
independent interpolation in space and time of its sampled values, provided that the sampling separa¬ 
tions Af and At are such that A? < 1/2 f c x and Ar < 1/2 f c T , where f% and f r T are the spatial and 
temporal bandwidths. 

Theorem 2 (Crick et al., 1981; Faille & Poggio, 1981) 

Assume that the spatiotcmporal signal g(x, t) = g(x — vt). The function g can then be reconstructed 
at the desired resolution from its spatial (temporal) samples. The required sampling density can be 
decreased arbitrarily by knowledge of the velocity v. If only the sign of the velocity is available the 
maximum sampling distance can be twice the classical limit for stationary patterns. 

Comments 

a) The proof of these results can be easily obtained from diagrams in the f x — f t Fourier plane (see 
Fig. 9; Crick et al, 1981). 

b) Theorem 1 requires the function g(x, t ) to be bandlimitcd before sampling takes place, since 
overlap of the frequency lobes as an effect of sampling usually leads to an irretrievable loss of infor¬ 
mation. This condition is not needed in theorem 2. Overlap never occurs (for infinitely long motion) 
even when the pattern f(x) is not bandlimited in spatial frequency. Any desired part of the original 
spectrum can be recovered exactly (without aliasing) by an appropriate interpolation filter. 

c) The spatiotcmporal filter implementing the interpolation depends on v. Assume, for instance, to 
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endow an interpolation scheme with direction selective properties (i.e. to use information about the 
sign of v ): it can be shown that the new spatiotemporal filter is obtained by adding to the spatiotem- 
poral impulse response its Hilbert transform with a sign controlled by the sign of v (in the case of 
Fig.8 the Hilbert transform of the spatial point spread function is an odd function). 
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